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Abstract 

The Gaussian Graphical Model (GGM) is a popular tool for incorporating sparsity 
into joint multivariate distributions. The G-Wishart distribution, a conjugate prior for 
precision matrices satisfying general GGM constraints, has now been in existence for 
over a decade. However, due to the lack of a direct sampler, its use has been limited 
in hierarchical Bayesian contexts, relegating mixing over the class of GGMs mostly to 
situations involving standard Gaussian likelihoods. Recent work, however, has devel- 
oped methods that couple model and parameter moves, first through reversible jump 
methods and later by direct evaluation of conditional Bayes factors and subsequent 
resampling. Further, methods for avoiding prior normalizing constant calculations-a 
serious bottleneck and source of numerical instability-have been proposed. We review 
and clarify these developments and then propose a new methodology for GGM com- 
parison that blends many recent themes. Theoretical developments and computational 
timing experiments reveal an algorithm that has limited computational demands and 
dramatically improves on computing times of existing methods. We conclude by de- 
veloping a parsimonious multivariate stochastic volatility model that embeds GGM 
uncertainty in a larger hierarchical framework. The method is shown to be capable 
of adapting to the extreme swings in market volatility experienced in 2008 after the 
collapse of Lehman Brothers, offering considerable improvement in posterior predictive 
distribution calibration. 
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1 Introduction 



The Gaussian graphical model (GGM) has received widespread consideration (see Jones 



et al. , 2005 ) and estimators obeying graphical constraints in standard Gaussian sampling 



were proposed as early as Dempster (1972). Initial incorporation of GGMs in Bayesian 



estimation largely focus on decomposable graphs (Dawid and Lauritzen, 1993), since prior 



distributions factorize into products of Wishart distributions. Roverato (2002) proposes a 
generalized extension of the Hyper-Inverse Wishart distribution for covariance matrices S 



over arbitrary graphs. Atay-Kayis and Massam (2005) turn this into a prior specified for 
precision matrices K and outline a Monte Carlo (MC) method that enables pairwise model 



comparisons. Following Letac and Massam (2007) and Rajaratnam et al. (2008), Lenkoski 



and Dobra (2011) term this distribution the G- Wishart, and propose computational im- 



provements to direct model comparison and model search. 

A number of samplers for precision matrices under a G- Wishart distribution have been 



proposed. These involve either block Gibbs sampling (Piccioni, 2000), Metropolis-Hastings 



(MH) moves (Mitsakakis et al., 2011 Dobra and Lenkoski, 2011 Dobra et al. , 2011), or 



rejection sampling (Wang and Carvalho, 2010). Dobra et al. (2011) shows that the rejection 



sampler of Wang and Carvalho (2010) suffers from extremely low acceptance probabilities 



in even moderate dimensions. Wang and Li (2012) conclusively show that block Gibbs sam- 
pling is both computationally more efficient and exhibits considerably less autocorrelation 
than the MH methods. 

The block Gibbs sampler provides a Markov chain Monte Carlo (MCMC) sample. When 
the likelihood assumes standard Gaussian sampling, determining posterior expectations of 



K can technically be performed as in Lenkoski and Dobra (2011), whereby model probabil- 
ities are first directly assessed via stochastic search, and model averaged samples are then 
collected using block Gibbs over each model. However, when the GGM is specified over la- 
tent data in a hierarchical Bayesian framework, such an approach is no longer valid. This is 
due to the use of the matrix K in updating other hyperparameters as well as its involvement 
in updating the latent Gaussian factors. 



Dobra and Lenkoski (2011) propose a reversible jump MCMC (which for brevity we refer 



to as RJ) method (Green, 1995) that simultaneously updates the GGM and its associated 



K, and embed the GGM in a semiparametric Gaussian copula. Dobra et al. (2011) expand 
the RJ algorithm and show how GGMs may be used to model dependent random effects 
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in a generalized linear model context, focusing on lattice data. Wang and Li (2012) utilize 
conditional properties of G-Wishart variates that enables model moves through calculation 
of a conditional Bayes factor (CBF) ( |Dickey and Gunel , 1978) and subsequently update K 
through direct Gibbs sampling. Wang and Li (2012) also explore the use of a double MH 



algorithm (Liang, 20101) to avoid the computationally expensive and numerically unstable 



MC approximation of normalizing constants proposed by Atay-Kayis and Massam (2005). 

We investigate an alternative method for simultaneously updating the GGM and asso- 
ciated K in hierarchical Bayesian settings. Our method is built on the framework outlined 
in Wang and Li (2012), but blends several of the developments above to yield an algorithm 



with considerably less computational cost. We show that use of a CBF on the Cholesky 



decomposition of a permuted version of K , originally proposed in an early version of Wang 



and Li (2012), enables fast model moves; use of methods for sparse Cholesky decompositions 



(Rue, 2001) further reduce computational overhead. We then show that while both Dobra 



et al. (2011) and Wang and Li (2012) indicate the random walk MH sampler of Mitsakakis 



et al. (2011) is not suitable for posterior sampling, it is especially useful in the context of 



the double MH algorithm. We are therefore able to specify a model move algorithm which 
requires little computational effort and exhibits no numerical instability. 



Simulation experiments compare our new algorithm to the algorithm of Wang and Li 



(2012) (which we refer to as WL). Both methods perform equally well at determining poste- 
rior distributions. However, we show that while the WL approach is theoretically appealing, 
it suffers significant computational overhead on account of many matrix inversions. By con- 
trast, our new approach exhibits a dramatic improvement in computation time. 

We conclude with an example of how GGMs may be embedded in hierarchical Bayesian 
models. GGMs have been shown to yield parsimonious joint distributions useful in financial 



applications (Carvalho and West, 2007; Rodriguez et al. 2011). However, existing studies 
have largely ignored heteroskedasticity in financial data and relied on datasets taken over 
periods with relatively little financial turmoil. To address this, we propose a parsimonious 
multivariate stochastic volatility model that incorporates GGM uncertainty. We then model 
stock returns for 20 assets during the period surrounding the financial crash of 2008. We show 
that in the periods leading up to the crash, and 9 months after the most turbulent period, 
this method yields no improvement over an approach that does not model heteroskedasticity. 
However, during the period of heightened volatility, our new model is able to adapt quickly 
and yields considerably more calibrated predictive distributions. 
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The article is organized as follows. In Section [2] we review the G-Wishart distribution, 
establish results necessary for CBF calculations and describe the block Gibbs sampler. Sec- 
tion [3] conducts a simulation study showing the computational advantage gained by our new 
algorithm. In Section [4] we describe our multivariate graphical stochastic volatility model 
and give results over the data mentioned above. We conclude in Section |5j 



2 The G-Wishart Distribution 

2.1 Review of Basic G-Wishart Properties 

Suppose that we collect data V = {Z (1) , . . . , Z (n) } such that Z {j) ~ W p (0, K' 1 ) indepen- 
dently for j G {1, . . . , n}, where K G P, the space of p x p positive definite matrices. This 
sample has likelihood 



pr(V\K) = (2it)- np l 2 \K\ n ' 2 exp (~(K, U)^ 



where {A, B) = tr(A'B) denotes the trace inner product and U = Y^h=i ■ 

Further suppose that G = (V, E) is a GGM where V = {1, . . . ,p} and E C V x V. We 
will slightly abuse notation throughout, by writing G G to indicate that the edge 
is in the edge set E. Associated with G is a subspace Pg C P such that K G Pq implies 



that K G P and = whenever G. The G-Wishart distribution (Roverato, 2002 



Atay-Kayis and Massam, 2005) Wg(5, D) assigns prior probability to K G Pg as 

The normalizing constant Iq is in general not known to have an explicit form, and Atay- 



Kayis and Massam (2005) propose an MC approximation for this factor. Furthermore, the 
G-Wishart is conjugate and thus pr(K [D, G) = Wg(5 + n, D*) where D* = D + U '. 

Let $ be the upper triangular matrix such that = K, the Cholesky decomposition. 



Rue (2001) notes that we may associate with G another graph F, called the fill-in graph, 
such that G C F, = when F and 



1=1 
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when i < j and € F\G. Rue (2001) outlines a straightforward method for constructing 
a graph F from G and explains how use of node reordering software can minimize fill-in. 



Roverato| ( |2002[ ) shows that if K ~ >V G (5, £>) then 



WAG) = fK^" 1 ^ 

1=1 ^ ' 



(2) 



where vf is number of nodes in {i + 1, . . . , p} that are connected to node i in G. We especially 
note that if K ~ W G (5,I P ), then 



\ (ij')e-F / »=1 77 



(3) 



2.2 Sampling Methods 

We review two MCMC methods for approximate sampling from a Wg(5, D). See Dobra 



et al. (2011) and Wang and Li (2012) for more detailed reviews of the many methods that 



have been proposed. 

Let C denote the cliques of G. In the following, we consider a clique to be a maximally 



complete subgraph, though Wang and Li (2012 ) note that this can be relaxed to any complete 



subgraph. Piccioni (2000) shows that for any C G C, 



Kc - K C y\cK v \ c Kv\c,c 



W(8,D C ), 



(4) 



where W denotes a standard Wishart variate. The expression Q thereby gives the full 
conditionals of Wg(8, D). The block Gibbs sampler thus cycles through C, updating each 
component using Q. |Wang and Li] pH2] ) convincingly show that for posterior inference 
of Wg(<5 + n,D*) the block Gibbs sampler outperforms all other proposed methods, both 
in terms of computing time and mixing. The authors also provide a useful review of the 
algorithm and indicate its broad flexibility. Throughout, we use the block Gibbs sampler for 
updating the matrix K in the posterior. 



Both Dobra et al. (2011) and Wang and Li (2012) show that the random walk MH 



(RWMH) algorithm of Mitsakakis et al. (2011 ) is unsuitable for posterior inference. However, 



we show below that it is especially effective to use in the double MH algorithm. In particular, 
suppose that we wish to sample from Wg(<5, I p ) and K is the current state of an MCMC 
chain. Then the RWMH algorithm performs the following 
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1. Determine $ from K 



2. Propose 

a. Sample c ~ Xs +L ,a and set ^ = c 1 / 2 

b. Sample *y ~ jV(0, 1) for G G 

c. Complete ^ using ([I]) for all (i, j) <E F \ G,i < j 

3. Compute 



a = exp 



(-2 S (*« 



(5) 



4. With probability min{a, 1} set K = 



The appeal of the RWMH algorithm in sampling from Wg{S, I p ) is the simplicity of the factor 
in (|5]). Through the use of node reordering software, which minimizes the size of F\G, this 
expression may require few calculations. While the algorithm does not perform well in the 
posterior, and the calculation ^ as well as step (2.c) become more involved when D ^ I p we 
show below that in the particular case of the double MH algorithm using the prior Wg(5, I p ), 
this method is extremely useful. 



2.3 Conditional Bayes Factors 

rnor to |Wang and G| Q2012fl , model moves between two graphs G and G' focused on ap- 
proximating the ratio 

pr(G\V) pr(V\G) pr(G) 

x (6) 



pr(G'\V) pr(V\G') pr(G 



A ' 



first through MC (Atay-Kayis and Massam, 2005| Jones et al. , 2005), then a combination 



of MC and Laplace approximation (Lenkoski and Dobra, 2011) and ultimately through RJ 



(Dobra and Lenkoski, 2011; Dobra et al., 2011). 



Suppose that G C G' which differ only by the edge e 
Let K e 
form 



e G' and that K e P G . 



K \ {Kij, Kji, Kjj}. In lieu of (|6j), Wang and Li (2012) consider ratios of the 

(7) 



pr(G\K- e ,V) _ pr(V,K- e \G) pr(G) 
pr(G'\K- e ,V) ~ pr{V,K~ e \G') X pr(G') 
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which are related to the conditional Bayes factors (CBFs) of Dickey and Gunel (1978). 
Using properties related to the form ^ Wang and Li (2012) show that 



pr(V,K- e \G) 
pr(V,K- e \G') 



H{5 + n,e,K~ e ,D* 



where, in general 

H(d,e,K- e ,S) 



I(d, S j:j ) 



J(d, S ee , An 
where I{b, c) = C - b / 2 2 b / 2 T{b/2), 




(<2-2)/2 



cxp 



-\{S,K°-K') 



J(h,B,b) 



2tt 

E>22 



1/2 



Q-i) 

b 2 I(h,B 2 2)exp 



B u - 



Bj2 

B22 



such that A = K r 



K e y\ e K V \ e K e y\ e . The matrix K° is equal to K except that K^- 



K 3,v\j K v\j K 3V\j and K ij 



V\e J 

K°- 

31 



0. Finally, the matrix K 1 is equal to K except that 



Kl 



K e,V\eK y\ e K e y\ e . 



By using the CBF in (|8j), Wang and Li (2012) propose model moves that do not rely on 
RJ methods, and after assessing which graph to move to, the parameter Kjj, as well as 
if e is in the accepted graph, are resampled according to their conditional distributions given 
K~ e . This method is appealing, as it offers an automatic manner of moving between graphs 



and does not rely on the tuning parameters used in the RJ methods of Dobra and Lenkoski 



(2011 


) and 


Dobra et al. 


(2011) 



While the result has significant theoretical appeal we show that computation of the factor 
H(5+n, e, K~ e , D*) is extremely costly, even in low dimensions. This is due to the formation 
of the matrices K° and K , which require the solution of systems involving large matrices, 
in particular, K v \j and K v \ e - 

Suppose now that G and G' differ only by the edge / = (p — l,p) again with G C G' . 
We consider the CBF 

pr(V, &~ f \G') 
pr(V,<f>- f \G) ' 

where $ = <& \ {$ p _i iP , $ pp }. In Appendix A we show that 



pr 
pr 



(V\<f>~f,G) 



Ig>(5,DY 



(9) 
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with, in general 



N(*- f ,S) = Vi,p-i 



where \i = § p -i; P -iS p - l)P / S pp , and 



2vr 
ST 



1/2 



p— l,p- 



i ELi 2 $i P -i$z P - 



This result originally appeared in an early version of Wang and Li (2012). In order to 
update a general edge e, we propose determining a permutation i? of V such that the nodes 
of V \e are reordered to reduce the fill-in of the graph Gy\ e and finally, the edge e is placed in 
the (p — l,p) position. Equation ^ is then calculated after permuting, K and D* according 
to 

The benefit of this method is the reduced computational overhead required to compute 
Q. The method requires relabeling the matrices K and D* and determining the Cholesky 
decomposition of the permuted version of K. Using node reordering software to minimize 
fill-in of Gy\e proves useful in the developments below. 



2.4 Avoiding Normalizing Constant Calculation 

Both g and Q require determination of the prior normalizing constants Iq and Iqi. While 



the MC method of Atay-Kayis and Massam (2005) enables these factors to be approximated 



the routine can be subject to numerical instability (Lenkoski and Dobra, 2011 Wang and 



Li, 2012) and involves significant computational effort. 



Wang and Li (2012) propose a method for avoiding the use of the MC approximation 



for prior normalizing constants. Their method employs the double Metropolis Hastings 



algorithm of Liang (2010), which is an extension of the exchange algorithm developed by 



Murray et al. (2006). 



We briefly review the implementation of the double MH algorithm in Wang and Li (2012 ). 
Suppose that (K, G) is the current state of the MCMC chain and we propose to move to G 1 
by adding the edge e to G. The double MH algorithm then forms a copy K of K, resamples 
Kij and Kjj according to G'. It then updates K via block Gibbs according to G' . Equation 
(J8j) is then replaced with 

H{8 + n,e,K~ e ,D*) 



H(S,e,K ,D) 



(10) 



We see that the expression (j 10|) has replaced the prior normalizing constants with an evalu- 

for theoretical 



ation of H in the prior, evaluated at K (see Murray et al. 



Murray et al. 


2006 


Liang 


2010 
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justifications of this procedure). This is clearly beneficial, as it avoids the need for involved 



MC approximation. Unfortunately, the procedure as implemented in Wang and Li (2012) 
requires a full run of the block Gibbs sampler, as well as determination of the matrices K° 
and K 1 and therefore contains many large matrix operations. 

We propose an alternative implementation of the double MH algorithm. Again sup- 
pose that (K, G) is the current state and we propose to move to G' by adding the edge 
/ = (p — l,p). We first determine $ from K. We then update $ to $ using the RWMH 
algorithm in Section 2.2 relative to G' . Equation ^ is then replaced with 

N(<$>- f ,D*) 



11) 



We can immediately see that the expression (11) is considerably simpler that (10); it 
requires no additional matrix inversions nor the evaluation of any trace inner products. 
Furthermore, the generation of the auxiliary variables through the RWMH is considerably 
less demanding computationally than the use of the Block Gibbs sampler, especially when 
D — I p , a common setting in practice. 

2.5 Algorithms for Full Posterior Determination 

In this section we outline the two algorithms we will consider for full posterior determination. 
Both algorithms create a sequence { (K [1] , GW ) , . . . , (K [s] ,G&)} where K w G P G w . Given 
the current state (K^ S \G^) the WL algorithm proceeds as follows 

0. Set K = and G = 

1. For each edge e, do: 

a. if e ^ G attempt to update G to G' = G U e with probability 

q{G'\K- £ , V) pr{G')H{5 + n, e, K e , D*) 



q(G\K- £ ,V) ' pr(G) 
if e G G the ratio is flipped. If G is not to be updated, skip to step c. 
b. If attempting to update G to G', sample K as discussed in Section 2.4 and 



calculate 
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if e G G', otherwise calculate 



a = min{l, H(6,e,K,D)} 

and with probability a set G = G', otherwise leave it unchanged, 
c. Resample Kij,Kjj according to G. 

After attempting to update each edge, set G^ s+1 ^ = G. 

2. Resample K^- s+1 ^ using the block Gibbs sampler relative to G^ s+1 ^ and the current state 
of K. 

We see that in one iteration of the WL algorithm, each edge is potentially updated in the 
graph. Our new algorithm (which we call CL) also follows this idea, and proceeds as follows 

0. Set K = K [s] and G = G w 

1. For each edge e, do: 



2.4 



and 



a. Determine a permutation i? of V p as discussed in Section 2.3, which places the 
edge e in the (p— l,p) = f position, and likewise permute K, G, D and D* . Let 
G denote the permuted version of G and $ be the Cholesky decomposition of 
the permuted version of K. If f £ G & attempt to update Cr to G' = G^ U / with 
probability 

q(G'\®- f ,V) _pr(G')N($- f ,D*) 
q(G»\$- f ,V) ~ pr(G») 

if / G G® the ratio is flipped. If G® is not to be updated then, skip to step c. 

b. If attempting to update G to G', sample $ as discussed in Section 
calculate 

a = miB{l,JV~ 1 (*~ / ,i5)} 
if / G G', otherwise calculate 

a = mm{l,N($~ f ,D)} 

and with probability a set G® = G ', otherwise leave it unchanged. 

c. Resample $ pp according to G® . Then reform K and G by unpermuting 
the system. 
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After attempting to update each edge, set G^ +1 ^ = G. 

2. Resample _K~t s+1 l using the block Gibbs sampler relative to G^ s+1 ^ and the current state 
of K. 

As we can see, there is somewhat more bookkeeping involved in the implementation of the 
CL algorithm, as the system is constantly being permuted. However, the reduction in com- 
putation time by using the RWMH algorithm and requiring only the calculation of the factors 

— f — / 

iV(<fr ,D*) and iV(<& , D) is dramatic, as we show below. 



3 Simulation Study 

In this section we conduct a simulation study that compares the method we have developed 



to the WL algorithm. Our example comes directly from Wang and Li (2012). We consider 

nA~ l where n 



a situation in which p = 6 and let U = YY' 



6; ^4i,i+i — A. 



.5 for i — 1, . . . , 5 and vl 16 = A, 



61 



18 and An = 1 for 
.4. We finally assume the 



prior K ~ Wg(3,Is). Using exhaustive MC approximation of the entire graph space, Wang 



and Li (2012) show that the posterior probability of each edge is 

/ 



iPa\A) 



\ 



1 


0.969 


0.106 


0.085 


0.113 


0.85 


0.969 


1 


0.98 


0.098 


0.081 


0.115 


0.106 


0.98 


1 


0.982 


0.0098 


0.086 


0.085 


0.098 


0.982 


1 


0.98 


0.106 


0.113 


0.081 


0.098 


0.98 


1 


0.97 


0.85 


0.115 


0.086 


0.106 


0.97 


1 



/ 



We use this example and compare the CL algorithm to the WL algorithm. Following 



Wang and Li (2012) we run both the WL and CL algorithms as described in Section 2.5 for 
60, 000 iterations and discard the first 10, 000 iterations as burn-in. Both algorithms were 
implemented in R, though C++ was used for block-Gibbs updates. We note that if a pure 
R implementation had been used, the time differences between WL and CL would be even 
more dramatic. 

We record the total computing time and looked at the mean squared errors of the poste- 
rior inclusion probabilities from the two runs compared with the true values given above. We 
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Table 1: Comparison of CL and WL algorithms for the six dimensional example. 





Time (sec) 


MSE 




Mean SD 


Mean SD 


CL 


182.5 (4.1) 


0.0088 (6e-04) 


WL 


818.4 (19.2) 


0.0349 (0.0025) 



repeated the entire process 100 times, each time starting both WL and CL from the same 
random starting point. Table [I] shows the average computing time in seconds (on a 2.8 GHz 
desktop computer with 4GB of RAM running Linux), average MSE and standard deviations 
across the 100 runs. The first column shows the expected result: even in six dimensions the 
WL algorithm takes more than 4 times as long to perform the same number of iterations as 
the CL algorithm. This shows the improved efficiency of the proposed method. 

We found the results in the third column surprising, but do not draw broad conclu- 
sions from it. It appears that in this example, using 60, 000 iterations, the CL algorithm 
approaches the true posterior edge expectation more quickly than the WL algorithm. Since 
both algorithms are correct theoretically, we choose not to emphasize this result. Further- 
more, we have determined that by doubling the number of iterations, both approaches yield 
essentially the exact posterior distribution, though again the WL algorithm takes more than 
4 times as long to run. 



This example was chosen as it appears in Wang and Li (2012) and has an exact an- 
swer. The fact that the CL algorithm is considerably faster than the WL approach even in 
6-dimensions indicates the broader appeal for searching truly high dimensional spaces. 



4 A Multivariate Graphical Stochastic Volatility Model 



Modeling the joint distribution of returns for a large number of assets is an important compo- 



nent of portfolio allocation and risk management. Carvalho and West (2007) and Rodriguez 



et al. (2011 ) both show that the use of GGMs can substantially improve modeling of joint as- 



set returns. However, heterogeneity in asset returns was, at best, tangentially addressed. The 



study of Carvalho and West (2007) considered a fixed (decomposable) graph throughout the 



entire period and assumed that asset returns were identically distributed. Rodriguez et al. 
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(2011) allowed for mixing over the class of decomposable graphs and also introduced some 
heterosckedasticity by considering an infinite-dimensional hidden Markov model (iHMM). 
However, inside groups of observations in the iHMM, variances were assumed constant. 

Despite these constraints in model assumptions, both studies showed substantial improve- 
ments by incorporating GGMs into the precision matrix associated with joint asset returns. 
We consider a situation in which the notion of homoskedastic, normally distributed asset re- 
turns is simply untenable; namely, the period surrounding the financial turbulence associated 
with the collapse of Lehman Brothers. We show that by utilizing the developments in the 
previous sections, we are able to specify a parsimonious stochastic volatility model for multi- 
variate assset returns that quickly adapts to changes in market volatility. This model shows 
the flexibility of the new approach in embedding the GGM in larger hierarchical Bayesian 
frameworks. 

4.1 The stochastic volatility model 

Let Y t be the log-returns of p correlated assets. We specify the following hierarchical model 
for these returns 

Y t \K, X t ~ Ap(0, exp(X t )K _1 ) (12) 

X t |0,X t _ 1 ,T~JVX«_l,T- 1 ) 

~ Af{0, r ) 
r ~ T(a, b) 
K\G~W G (5,D) 
G~pr(G). 



In the likelihood (12) we see that asset returns are assumed to be mean-zero. The X t 
terms then dictate an overall level of market volatility, while a constant precision parameter 
K dictates the degree to which asset returns are correlated. While this model is parsimo- 
nious, it serves as a useful first departure from previous studies as it explicitly incorporates 
notions of stochastic volatility. For purposes of identification, we set Xq = 0. In the conclu- 
sions section we discuss further possible generalizations to this framework. 

After collecting a time-series of returns Y^ l ' T \ we then aim to determine the posterior 
distribution 

pr{K,G,T,4>,X\Y^) 
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where X = (X 1; . . . , Xx)- Furthermore, we may be interested in the posterior predictive 
distribution pr(Y^ T+1 ^\Y^ 1:T ^). The parameters X,(f),r are updated with standard block 



MH or Gibbs steps (see Rue and Held, 2005) and the posterior predictive distribution is 
easily formed from these parameters. However, we note in particular that 



exp(V t 



(13) 



From (13) we see why the developments in Section 2 prove useful. We may update K 



and G jointly using the CL algorithm discussed in Section 2.5 simply by setting D* = 
D + J2t=i(Y^Y^ )/ exp(Xt). This allows us to easily embed a sparse precision matrix K 
and mix over the class of GGMs in any hierarhical Bayesian model that involves a standard 
Wishart distribution. 



4.2 The data 

To apply our model and algorithm we randomly chose 20 stocks from the S&P 500. These 
stocks were: Aetna Inc. (AET), CA Inc. (CA), Campbell Soup (CPB), CVS Caremark 
Corp. (CVS), Family Dollar Stores (FDO), Honeywell Int'l Inc. (HON), Hudson City 
Bancorp (HCBK), JDS Uniphase Corp. (JDSU), Johnson Controls (JCI), Morgan Stanley 
(MS), PPG Industries (PPG), Principal Financial Group (PFG), Sara Lee Corp. (SLE), 
Sempra Energy (SRE), Southern Co. (SO), Supervalu Inc. (SVU), Thermo Fisher Scientific 
(TMO), Wal-Mart Stores (WMT), Walt Disney Co. (DIS), Wellpoint Inc. (WLP). 

We chose a time period where markets experience both high and low volatility to evaluate 
the flexibility of our model. We chose the time period from October 31, 2001 to May 21, 
2008 as our training period to fit our model and make predictions for the time period from 
May 22, 2008 to October 23, 2009. The time periods consist of 1650 and 360 trading days 
respectively. Figure [I] shows the mean of the squared returns for the these 20 securities over 
the entire dataset. The extreme volatility present in the markets after the collapse of Lehman 
brothers in September 2008 is readily evident, showing that a homoskedasticity assumption 
is untenable for these data. 



4.3 Predictive Performance Results 



We assess the relative performance of the stochastic volatility model we develop in Section 4.1 



versus a method that embedds GGMs, but does not have a stochastic volatility component. 
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Figure 1: Mean of the squared returns taken over all 20 stocks during the entire time period 
from October 31, 2001 to October 23, 2009. 



For each day t + 1 in the forecast period we run our stochastic volatility model from the 
beginning of the training period until the previous day t to obtain estimates for the model 
parameters. We run the algorithm for 60, 000 iterations, discarding the first 10, 000 as burn- 
in (keep in mind that in one iteration of the algorithm all edges are evaluated). Using the 
posterior sample, we obtain the posterior predictive distribution of Y^ t+1 \ 

Figure [2] shows the mean of the posterior predictive distribution of the volatility compo- 
nent X t+ i using the returns up to time t, which drives the predictive distribution of Y^ t+1 ) 
for each day in the forecast period. Comparing Figures [T] and [2] we see that our model reflects 
the time-dependent volatility well. At the beginning when the market is quiet, X t +i takes 
lower values mostly between and 1. After the shock of the financial crisis the volatility 
in the market goes up extremely, which is reflected by significantly higher values of X t +\. 
Months later, towards the end of the forecast period, the market has cooled down and the 
terms X t +i reflect this. 

The two methods we compare both return full predictive distributions. By construction, 
these predictive distributions have the same mean and median since returns are assumed 
mean-zero. Judging their performance therefore requires assessing the entire predictive dis- 



tribution. We assess their performance using the energy score introduced by Gneiting and 



Raftery (2007). 



The energy score is a proper scoring rule, which is a multivariate generalization of the 
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Figure 2: Means of posterior predictive distribution for the volatility component Xt+i from 
May 22, 2008 to October 23, 2009. 

continuous ranked probability score. It is defined as 

ES(F,x) = E F ||X-x|| 2 -^E F ||X-X / || 2 (14) 

where F is our predictive distribution for a vector-valued quantity, X and X' are independent 
random variables with distribution F and x is the realization. 

For each day in the forecast period, we compute the energy score for the predictive 
distribution returned by the two methods considered. Figure [3] shows the difference between 



the stochastic volatility model developed in Section AA_ and the model that incorporates 
GGM uncertainty but holds volatility fixed. As we can see in Figure |3j between May and 
August, 2008, there is no clear difference between the two approaches. However, after the 
financial turbulence in September, 2008, the stochastic volatility model outperforms the 
fixed volatility model by a considerable margin. During almost every day in the turbulent 
period, the energy scores are lower under the stochastic volatility model. After the market 
turbulence subsides, the two models return to performing equally well again. 

This short example shows the utility of the computational methodology developed in 
this paper. The model is simple, in many respects, but a non-trivial deviation from the 
standard iid sampling framework to which the GGM was initially relegated. By now being 
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Figure 3: Difference of energy score from the predictive distribution of the model with 
stochastic volatility versus the model with fixed volatility. Values below zero indicate the 
stochastic volatility model outperformed the fixed volatility model. 

able to embed the GGM in more complicated hierarchical frameworks, we are able to address 
difficult sampling schemes while simultaneously incorporating sparsity in the estimate of joint 
distributions. 

5 Conclusions 

We have synthesized a number of recent results related to the G-Wishart distribution. This 
has allowed for an algorithm that does not rely on RJ methods, obviates the need for expen- 
sive and numerically unstable MC approximation of prior normalizing constants and does 
so with minimal computational effort. The improvement in computation time is sufficient 
that at each stage of the algorithm, all edges may be evaluated for inclusion or exclusion in 
the graphical model. This algorithm allows the GGM to be embedded in more sophisticated 
hierarchical Bayesian models and opens the possibility of replacing standard Wishart dis- 
tributions with G- Wishart variates, leveraging the improvement in predictive performance 
offered by sparse precision matrices. 
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The applied example shows the usefulness of this combination. We are able to sparsely 
model the interactions in financial assets while simultaneously addressing the issues of 
stochastic volatility prevalent in markets undergoing turbulence. The method is able to 
characterize the distribution of asset returns during periods of rapidly fluctuating volatility 
much better than standard iid frameworks. 

The stochastic volatility model we develop remains parsimonious and several adjustments 
could be made. The first such development would be to replace the univariate term X t with 
a multivariate factor that allows the variance of each asset to follow its own path, while 
potentially tying the evolution of these factors together with a separate GGM. Furthermore, 



employing some form of the iHMM framework of Rodriguez et al. (2011 ) could allow for the 
matrix K to change throughout the period as well. Such developments will be considered 
in future work. 
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Appendix A: Determination of CBF for using <E> f 

Consider 

pr(V,$- f \G') 



pr(V,<f>- f \G) 

we note that 



pr(V,<f>- f \G')= j f pr(V,<f>\G')d<f>j 



and 

pr(V,&- f \G)= ! pr{V,<S>\G)d% p 

J $pp 

up to common terms we thus have that 

priV, oc jT exp (-^(Vi, + ^ d%- llP 

recognizing the integral as the kernel of a normal distribution, this yields 



27T N 1/2 



D* 

pp 



Further, again up to common terms 



pr(V, *-f\G) oc exp {-\d;^ q + /i) : 



and thus 

pr(V,*- f \G>) = f Ig(8,D) 

pr(V,®- f \G) 1 ' J I G >(6,D) 
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